feat(wire): create connections in batch#865
feat(wire): create connections in batch#865Davidson-Souza wants to merge 4 commits intogetfloresta:masterfrom
Conversation
898303b to
1db21e3
Compare
| let good_peers_count = self.connected_peers(); | ||
| if good_peers_count > T::MAX_OUTGOING_PEERS { | ||
| debug!( | ||
| "Already have {} peers, disconnecting peer to avoid blowing up our max of {}", | ||
| self.peers.len(), | ||
| T::MAX_OUTGOING_PEERS | ||
| ); | ||
|
|
||
| self.send_to_peer(peer, NodeRequest::Shutdown)?; | ||
|
|
||
| return Ok(()); | ||
| } |
There was a problem hiding this comment.
Hm I think better to take into account how many peers we could connect to before exceeding the limit BEFORE trying more connections, see next comment
There was a problem hiding this comment.
Also, to control excess peers we can disconnect slow peers in IBD and random peers once synced... followup PR I have already done privately
| let connection_kind = ConnectionKind::Regular(required_service); | ||
| if self.peers.len() < T::MAX_OUTGOING_PEERS { | ||
| for _ in 0..NEW_CONNECTIONS_BATCH_SIZE { | ||
| self.create_connection(connection_kind)?; | ||
| } |
There was a problem hiding this comment.
I had this in a private branch, I think makes sense
// How many peers we can connect to without exceeding the limit, if any
let peer_capacity = T::MAX_OUTGOING_PEERS.saturating_sub(self.connected_peers());
let connection_kind = ConnectionKind::Regular(required_service);
// Try connecting to at most 4 peers (fewer if max capacity would be reached)
for _ in 0..peer_capacity.min(MAX_OPEN_CONNECTIONS_BATCH) {
self.create_connection(connection_kind)?;
}There was a problem hiding this comment.
I've tried this, but for the last connections we basically revert to the older logic. I had some runs where I would create 8/9 connections super quick, and then just hang for dozens of seconds.
Using this logic we can get 10 peers for chain selection pretty quickly.
There was a problem hiding this comment.
So you exceed the limit on purpose then you disconnect the exceeded peers. Btw the 8/9 fast connections happens to me on master, idk why it takes longer to get the 10th peer.
There was a problem hiding this comment.
So you exceed the limit on purpose then you disconnect the exceeded peers.
Yes! It should not get bigger than 20 tho, because we also abort the connection pretty quickly if they don't respond
Btw the 8/9 fast connections happens to me on master, idk why it takes longer to get the 10th peer.
For me as well, I think it has to do with the address manager getting populated by the addresses received from new connections? It's really weird tho. At least for me it doesn't happen on this PR
| if self.peers.len() >= T::MAX_OUTGOING_PEERS { | ||
| return Ok(()); | ||
| } |
There was a problem hiding this comment.
Then we will keep connecting and disconnecting peers, I think it's fine
Edit: outside of the feelers logic
| /// How many connections we try at once | ||
| const NEW_CONNECTIONS_BATCH_SIZE: usize = 10; |
There was a problem hiding this comment.
Since we would sometimes try fewer connections this is more accurate. I think 4 is kinda nice but this is just how I feel
/// The maximum number of connections we try to open in parallel, in `maybe_open_connection`.
const MAX_OPEN_CONNECTIONS_BATCH: usize = 4;There was a problem hiding this comment.
Maybe this could be part of NodeContext. Then use 4 for all but ChainSelector?
There was a problem hiding this comment.
ACK, what if we used 12 for ChainSelector so that 1-2 failed connections don't make us wait 10 seconds (also a nice multiple of 4). We keep the first 10 peers. Not sure.
|
|
||
| impl NodeContext for ChainSelector { | ||
| const REQUEST_TIMEOUT: u64 = 60; // Ban peers stalling our IBD | ||
| const TRY_NEW_CONNECTION: u64 = 1; // Try creating connections more aggressively |
There was a problem hiding this comment.
I would keep this with 4 peers as batch size
There was a problem hiding this comment.
I've increased this to make sure we GC the dead peers before attempting a new round. We remove all pending Connects before trying to add more, in order to keep the inflight connections list small
| let good_peers_count = self.connected_peers(); | ||
| if good_peers_count > T::MAX_OUTGOING_PEERS { | ||
| debug!( | ||
| "Already have {} peers, disconnecting peer to avoid blowing up our max of {}", |
There was a problem hiding this comment.
If good_peers_count is the value triggering the log, isnt more accurate for it to be:
debug!(
"Already have {good_peers_count} peers, disconnecting peer to avoid blowing up our max of {}",
T::MAX_OUTGOING_PEERS
);?
|
|
||
| /// Attempt to open a new connection (if needed) every TRY_NEW_CONNECTION seconds | ||
| const TRY_NEW_CONNECTION: u64 = 10; // 10 seconds | ||
|
|
There was a problem hiding this comment.
What if we move it here:
/// How long should we wait for a peer to respond our connection request. This shouldn't be
/// greater than `TRY_NEW_CONNECTION` in order to clear timed-out requests at the same pace.
const CONNECTION_TIMEOUT: u64 = 10;There was a problem hiding this comment.
Good idea! Will do
1571ff6 to
e991a0f
Compare
|
Pushed 1571ff6:
|
|
Isn't using a |
| ) -> Result<(), WireError> { | ||
| // try to connect with manually added peers | ||
| self.maybe_open_connection_with_added_peers()?; | ||
| if self.peers.len() >= T::MAX_OUTGOING_PEERS { |
There was a problem hiding this comment.
Shouldn't we use connected_peers()
e991a0f to
96a1d80
Compare
|
Pushed 96a1d80 with this diff diff --git a/crates/floresta-wire/src/p2p_wire/node/chain_selector_ctx.rs b/crates/floresta-wire/src/p2p_wire/node/chain_selector_ctx.rs
index 068fe30..1ea730f 100644
--- a/crates/floresta-wire/src/p2p_wire/node/chain_selector_ctx.rs
+++ b/crates/floresta-wire/src/p2p_wire/node/chain_selector_ctx.rs
@@ -140,6 +140,10 @@ pub enum PeerCheck {
impl NodeContext for ChainSelector {
const REQUEST_TIMEOUT: u64 = 60; // Ban peers stalling our IBD
+ // Since we don't have any peers when this starts, we use a more aggressive batch
+ // size to make sure we get our MAX_OUTGOING_CONNECTIONS ASAP
+ const NEW_CONNECTIONS_BATCH_SIZE: usize = 12;
+
fn get_required_services(&self) -> ServiceFlags {
ServiceFlags::NETWORK | service_flags::UTREEXO.into() | service_flags::UTREEXO_FILTER.into()
}
diff --git a/crates/floresta-wire/src/p2p_wire/node/conn.rs b/crates/floresta-wire/src/p2p_wire/node/conn.rs
index 6d7202a..7fa3e34 100644
--- a/crates/floresta-wire/src/p2p_wire/node/conn.rs
+++ b/crates/floresta-wire/src/p2p_wire/node/conn.rs
@@ -573,7 +573,7 @@ where
) -> Result<(), WireError> {
// try to connect with manually added peers
self.maybe_open_connection_with_added_peers()?;
- if self.peers.len() >= T::MAX_OUTGOING_PEERS {
+ if self.connected_peers() >= T::MAX_OUTGOING_PEERS {
return Ok(());
}
diff --git a/crates/floresta-wire/src/p2p_wire/node/running_ctx.rs b/crates/floresta-wire/src/p2p_wire/node/running_ctx.rs
index e7c91ac..cced16f 100644
--- a/crates/floresta-wire/src/p2p_wire/node/running_ctx.rs
+++ b/crates/floresta-wire/src/p2p_wire/node/running_ctx.rs
@@ -58,10 +58,6 @@ pub struct RunningNode {
impl NodeContext for RunningNode {
const REQUEST_TIMEOUT: u64 = 2 * 60;
- // We expect already having several peers after IBD, so we don't need to be too aggressive when
- // creating new connections
- const NEW_CONNECTIONS_BATCH_SIZE: usize = 4;
-
fn get_required_services(&self) -> ServiceFlags {
ServiceFlags::NETWORK
| service_flags::UTREEXO.into()
diff --git a/crates/floresta-wire/src/p2p_wire/node_context.rs b/crates/floresta-wire/src/p2p_wire/node_context.rs
index ea3bfe3..224fb5e 100644
--- a/crates/floresta-wire/src/p2p_wire/node_context.rs
+++ b/crates/floresta-wire/src/p2p_wire/node_context.rs
@@ -84,7 +84,7 @@ pub trait NodeContext {
const MAINTENANCE_TICK: Duration = Duration::from_secs(1);
/// How many connections we try at once
- const NEW_CONNECTIONS_BATCH_SIZE: usize = 12;
+ const NEW_CONNECTIONS_BATCH_SIZE: usize = 4;
fn get_required_services(&self) -> ServiceFlags {
ServiceFlags::NETWORK |
| periodic_job!( | ||
| self.last_feeler => self.open_feeler_connection(), | ||
| RunningNode::FEELER_INTERVAL, | ||
| ); |
| periodic_job!( | ||
| self.last_feeler => self.open_feeler_connection(), | ||
| SyncNode::FEELER_INTERVAL, | ||
| ); |
|
There's a problem here: if we |
|
Righttt, what about a small PR only adding |
Hold my coffee, will craft one in a jiffy |
|
Holding your cubata |
All peers that are added by the user — those added with the RPC
`addnode` or the CLI option `--connect`, should have kind `Manual`,
rather than `Regular`. This excempts them from:
- Max peers quota — if we already have 10 peers and the user tries
to addnode another one, we will now have 11 peers
- They whitelisted and shouldn't be banned by misbehaving
- They don't need to have any required service
This commit adds this new variant, and makes sure we are following
the above rules.
Manual connections are whitelisted and should not be banned, even if they misbehave. This commit makes sure we are not doing so. I've removed all the times we just disconnect a misbehaving peer with a call to `increase_banscore` with the maximum `banscore` as the increase amount (so they will be banned right away)
Sometimes when we initialize new connections, our `good addresses` table might not be well populated. This causes our node to attempt the "normal" addresses — the one we still haven't tried. As a consequence, our node will need to attempt several addresses before finding a reachable one. This commit creates connection attempts in batches of 10, so we try more addresses in a shorter period of time. To make sure we don't keep too many inflights at the time, I've decreased the connection timeout time, so only those who reply in a timely manner will be kept. I've also added a logic to `handle_peer_ready` to disconnect peers if we already have MAX_OUTGOING_PEERS.
When we use `periodic_jobs` with `maybe_create_connections` we might create a lot of `NoAddressAvailable` errors, specially on signet/testnet/regtest. This offers little information and pollutes the terminal. This commit uses the no_log attribute from `periodic_job` to avoid logging errors to stdout
96a1d80 to
4ce49a7
Compare
|
I've incorporated #866 and it seems to fix #865 (comment). Marking this as draft until #866 gets merged |
| T::MAX_OUTGOING_PEERS | ||
| ); | ||
|
|
||
| self.send_to_peer(peer, NodeRequest::Shutdown)?; |
There was a problem hiding this comment.
Note we have just sent a GetAddresses to this node, but then we disconnect. Idk if we will receive the addresses at all, or just disconnect.
A random idea I got: maybe we can convert these excess peers into feeler peers. We get the addresses and then disconnect.
Idk if this is a good idea tho.
Description and Notes
Sometimes when we initialize new connections, our
good addressestablemight not be well populated. This causes our node to attempt the "normal"
addresses — the one we still haven't tried. As a consequence, our node will
need to attempt several addresses before finding a reachable one.
This commit creates connection attempts in batches of 10, so we try
more addresses in a shorter period of time. To make sure we don't
keep too many inflights at the time, I've decreased the connection
timeout time, so only those who reply in a timely manner will be kept.
I've also added a logic to
handle_peer_readyto disconnect peers ifwe already have MAX_OUTGOING_PEERS.
Finally, I've made the
periodic_jobthat creates new connectionsno_log, so we stop polluting the terminal.